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Preface 

This  paper  was  originally  intended  as  a  report  on  some  large  sample 
results  for  quasidensity  and  pattern  probability  estimation,  results  that 
provide  a  foundation  for  our  methods  for  ability  distribution  estimation  and 
item  response  function  estimation.  While  attempting  to  determine  the 
generality  of  one  of  the  results,  I  happened  upon  a  promising  new  line  of 
research.  The  new  methods  and  results  (Section  Three)  admittedly  haven't 
been  integrated  into  this  report  very  well.  The  main  result,  relating 
models  with  a  continuum  of  abilities  to  finite  models,  seems  at  least  as 
important  as  the  large  sample  results.  (It  asserts  that  every  item  response 
model  with  a  smooth  ability  distribution  and  smooth  item  response  functions 
is  isomorphic  to  a  latent  class  model  obtained  by  replacing  the  ability 
distribution  with  a  discrete  distribution.)  The  reader  primarily  interested 
in  the  generality  of  latent  class  models  may  wish  to  skim  Sections  One  and 
Two  for  notation  and  then  read  Section  Three.  Two  separate  papers 
eventually  will  be  prepared  for  publication. 


ABILITY  DISTRIBUTIONS,  PATTERN  PROBABILITIES,  AND  QUASIDENSITIES 


Introduction 

This  paper  solves  a  problem  closely  related  to  ability  distribution 
estimation,  which  is  arguably  the  central  problem  of  item  response  theory. 
The  problem,  quasidensity  estimation,  comes  up  when  one  needs  to  know  the 
probability  of  sampling  an  examinee  with  a  specified  item  response  pattern 
from  a  very  large  pool  of  examinees.  The  situation  in  which  item  response 
functions  are  specified  but  nothing  is  known  about  the  distribution  of 
ability  is  considered  in  this  paper. 

If  the  ability  distribution  has  a  density,  then  the  density  can  be  used 
to  calculate  the  probability  of  sampling  an  examinee  with  a  specified 
pattern.  A  pattern's  probability  is  simply  the  integral  of  the  product  of 
the  pattern's  likelihood  function  times  the  ability  density. 

It  is  shown  that  even  if  the  ability  distribution  is  a  step  function  or 
some  other  distribution  that  doesn't  have  a  density  there  is  an  essentially 
unique,  continuous  function  that  can  be  used  in  place  of  a  density  to 
compute  pattern  probabilities.  The  integral  of  the  product  of  this  function 
(the  quasidensity)  and  any  pattern's  likelihood  function  is  exactly  equal  to 
the  pattern's  probability. 

When  the  ability  distribution  is  unknown,  estimates  must  be  used. 
Quasidensity  estimation  is  easier  than  ability  distribution  estimation 
because  the  quasidensity  is  identifiable,  but  the  ability  distribution  is 
not.  Many  different  ability  distributions  will  fit  large  samples  equally 
well  (Levine,  1989).  By  contrast,  the  maximum  likelihood  quasidensity 
estimate  is  unique  (Section  II. 2,  below). 

Quasidensity  estimation  is  related  to  ability  distribution  estimation 
in  two  ways.  First,  under  general  conditions  the  indefinite  integral  of  the 
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quasidensity  equals  or  closely  approximates  the  cumulative  distribution  of 
ability.  Second,  the  methods  of  quasidensity  estimation  have  been 
generalized  to  obtain  a  nonparametric  theory  for  ability  density  estimation. 

As  others  (Lord,  1970;  Samejima,  1981)  have  observed,  item  response 
function  estimation  is  closely  related  to  ability  distribution  estimation. 
The  results  in  this  paper  are  central  to  our  current  work  on  nonparametric 
item  response  function  and  option  response  function  estimation. 

A  quasidensity  estimation  theory  is  developed  in  this  paper.  The 
quasidensity  is  represented  as  a  linear  combination  of  orthogonal  functions. 
The  set  of  linear  combination  coefficients  is  shown  to  be  convex  and 
compact.  It  is  shown  that  the  maximum  likelihood  estimate  of  the 
coefficients  is  strongly  consistent.  The  asymptotic  distribution  of  the 
coefficients  is  derived. 

Some  general  results  on  ability  distributions  are  also  proven.  For 
example,  it  is  shown  that  for  the  most  commonly  used  item  response  models, 
every  ability  distribution  is  equivalent  to  a  distribution  with  only 
finitely  many  points  of  increase.  An  upper  bound  for  the  number  of  points 


of  increase  is  obtained. 


Section  One 

Quasidensities,  Parameterizations ,  and  Approximations 
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In  many  applications  it  is  necessary  to  compute  the  probability  of 
sampling  an  examinee  with  a  specified  item  response  function.  For  example, 
pattern  probabilities  are  needed  in  situations  in  which  it  is  important  to 
decide  whether  because  of  cheating,  language  problems  or  an  ill-advised 
test-taking  strategy,  an  individual's  test-taking  behavior  is  so  unlike 
other  examinees'  that  his/her  test  score  is  virtually  uninterpretable.  With 
pattern  probabilities  a  uniformly  most  powerful  statistical  test  for  faulty 
answer  sheets  can  be  computed  (Levine  and  Drasgow,  1988). 

In  this  section  a  general  strategy  for  computing  pattern  probabilities 
is  derived  and  discussed.  We  begin  with  some  notation  and  a  brief  review  of 
basic  item  response  theory. 

Section  1.1:  Terminology,  Notation  and  Assumptions 

it  ic  -k  *Y 

Let  u  -  <u^,  u^,  ...,  u^>  denote  a  vector  of  ones  and  zeros 
indicating  right  and  wrong  answers  to  n  test  items.  Sampled  vectors  are 
locally  independent  relative  to  a  random  variable  8  if  the  conditional 

if 

probability  of  sampling  pattern  u  given  any  value  of  8  can  be  factored 
and  written 

Prob{u  =  u  \B  =  t)  -  II  Prob(u.  =  u.|#  -  t) 

i=l  1  1 

where  u  denotes  the  sampled  vector,  u^  is  its  ith  component  and  t  is 
one  of  the  possible  values  of  8  . 

Usually  in  item  response  theory  the  item  response  functions  P. (t)  ■= 

x 

Prob{u^  -  l| 8  =  t)  are  assumed  to  have  some  specific  (generally  logistic) 
functional  form  with  values  strictly  between  zero  and  one.  The  results  in 
this  paper  (except  for  the  last  part  of  Section  III)  assume  only  that  the 
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item  response  functions  are  continuous  functions  with  values  that  are 
strictly  between  zero  and  one. 


Usually  item  response  functions  are  defined  over  an  unbounded  range  and 
applied  over  a  finite  range,  typically  the  interval  [-3,3]  .  The  results 
in  this  paper  only  use  item  response  function  values  over  a  finite  range  of 
abilities.  Thus,  throughout  this  paper  the  are  continuous  functions 

with  values  strictly  between  zero  and  one  that  are  defined  on  some  finite 
closed  interval  [c,d]  .  This  results  in  no  loss  of  generality  because  the 
interval  c  je  very  large  and  because  an  unbounded  ability  continuum  can  be 
transformed  into  an  interval.  In  applications,  we  make  the  interval  [c,d] 
big  enough  so  that  the  assumption  that  all  abilities  are  in  [c,d]  is 
plausible. 

This  paper  is  concerned  with  distributions  on  [c,d]  ,  i.e. 
distributions  of  random  variables  that  are  between  c  and  d  with 
probability  one.  The  condition  that  a  distribution  function  G  is  a 
distribution  on  [c,d]  can  be  expressed  without  explicitly  referring  to 
random  variables  as  follows:  for  t  <  c,  G(t)  -  l-G(d). 

To  summarize,  the  results  to  follow  assume 

1.  local  independence  relative  to  a  unidimensional  random  variable 
9, 

2.  continuous  item  response  functions  defined  on  an  interval  [c,d] 
and  taking  values  strictly  between  zero  and  one,  and 

3.  Prob(u  <  6  <  d)  -  1. 
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Section  1.2:  The  Canonical  Space  and  its  Quasidensities 

The  assumption  that  the  P.,  are  strictly  between  zero  and  one  implies 

>V  iV 

that  for  every  pattern  u  ,  the  pattern  Likelihood  function  £(u  ,•)  given 

by 

*  * 

*  *  n  u  •  1  -  u . 

*(u  ,t)  ~  Prob(u=UV|  #=t)  =  II  P.(t)  1  [1  -  P .  ( t) )  1 

1  1  1 

will  also  be  a  continuous,  positive  function  defined  on  [c,d]  .  By  forming 
linear  combinations  of  likelihood  functions  we  obtain  a  finite  dimensional 
real  vector  space  which  is  called  the  test's  canonical  space  (CS) .  Thus  f 
is  in  the  CS  if  and  only  if  for  some  real  constants  a 

2n 

f(0-S  a 

v=\ 

■k  i<  it  n 

where  u^,  Uj,  ...  u^  ...  is  any  enumeration  of  the  2  possible  item 
response  patterns.  Since  the  functions  in  the  CS  are  continuous,  an  inner 
product  for  the  CS  is  defined  by  <f,g>  =  f(t)g(t)  dt  . 

Note  that  since  the  likelihood  functions  may  be  linearly  dependent,  it 
may  be  possible  to  write  a  function  as  a  linear  combination  of  likelihood 
functions  in  several  ways.  The  uniqueness  referred  to  in  the  following 
result  applies  to  functions,  not  vectors  of  coefficients  (a^)  . 

I . 1  There  is  a  unique  function  g  in  the  CS  such  that  for  all  response 
* 

patterns  u 

Prob(u»uf)  =  J ^  S. ( \i* ,  t ) g ( t )  dt  . 

Equivalent. v .  for  any  distribution  G  on  (c,dj  there  is  a  unique  g 
in  the  CS  such  that 

f(u*,t)  dG( t)  =  Jd  2(u*,t)g(t)  dt 


Proof:  The  formula 
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h  -  2  a  l  -»  E[h(0)l  =  2  a  Prob(u=u*) 
v  >/  v  '  1/1/  t/ 

defines  a  linear  mapping  on  the  CS  since  expectation  is  linear.  Since  every 
linear  functional  defined  on  an  inner  product  space  that  is  isomorphic  to  a 
Euclidean  space  can  be  written  as  an  inner  product,  there  is  some  g  in  the 
CS  such  that  the  mar-ping  can  be  written 

2  a  £  (♦)  -  <  S  a  i  (»),g>  • 

V  V  V  V  V  V 

■k 

In  particular,  if  all  the  a's  are  zero  except  one,  then  for  every  u^ 

Prob(u«u*)  -  <i(u*,») ,g>  . 

If  g  also  satisfies  these  conditions  then  for  all  v 

o  -  <&u,g>  -  <nv,i> 

“  <i,.g -g>  • 

Thus  g-g  -  0  because  no  nonzero  element  of  a  vector  space  can  be 
orthogonal  to  all  of  the  vector  space's  generators.  // 

The  function  g  is  called  the  quasidensity  of  G  because  it  functions 
like  a  density  in  the  calculation  of  pattern  probabilities.  In  fact,  it  can 
be  used  in  place  of  a  density  to  calculate  the  expected  value  of  any 
statistic  that  is  a  function  of  item  responses  (Levine,  1989,  Section  2). 
Although  the  quasidensity  integrates  to  one,  it  is  generally  not  non- 
negative.  A  discussion  of  their  properties  can  be  found  in  (Levine,  1989). 

When  an  orthonormal  basis  for  the  CS  is  available  the  quasidensity 
has  a  simple  formula  which  is  often  used. 
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1.2:  If  6  has  distribution  G  where  G  is  a  distribution  on  [c,d]  and 

ho,hi’  ■  “  hj 

is  an  orthonormal  basis  for  the  canonical  space,  then  the  quasidensitv 
for  G  is 


g(-)  -  S  E[h.(*)]h,(-)  . 


Proof :  Since  is  an  orthonormal  basis,  eacn  pattern  likelihood 

* 

function  satisfies  i(u^,t)  =  ,hj>h. (t)  .  Consequently 

P(u-u*)  -  fd  S.<0  ,h.>h. (t)  dG(t) 

-  S.  <i  ,h.>  E[h.(0)) 

J  v  ’  j  J 

=  S  ii/(t)hj(t)dt  E[hj  (5 )  ] 

-  i(u*  t)  S.  E[h. (0) }h. (t)dt. 

From  uniqueness  proven  in  1.1  it  follows  that  g(»)  -  2  E(lu  (0) ]h. ( • )  .  // 

The  value  of  the  quasidensity  in  studying  pattern  probabilities  derives 
from  the  following  obvious  but  very  useful  fact: 

1.3:  If  Ij^Q  is  a  basis  for  the  CS  and  the  quasidensif  ' 

distribution  of  6  satisfies 


g(0  -  2  *  h  (.) 
0  J  J 


Prob(u  =  u  ;  =  E  7r.<h.,i  >  . 

v  n  J  J  u 


Thus,  each  pattern  probability  can  also  be  represented  as  an  inner  product 
of  a  known  vector  depending  only  on  the  pattern  and  a  vector  that  must  be 


estimated. 
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Section  1.3:  Approximations  and  Bases 

For  applications,  the  size  of  J  is  important.  For  the  Rasch  model 

(P.,(t)  -  [1+e  ^  ^i^  ]  J  cannot  be  larger  than  the  number  of  items,  n 

(Levine,  1989,  Section  2).  On  the  other  hand  for  the  three  parameter 

-a.Ct-b^)  ^ 

logistic  model  (P.(t)  -  c^  +  (l-c^)[l+e  1  1  ]  )  ,  J  may  be  as  large 

as  2n.  Fortunately  by  careful  choice  of  the  functions  h^  ,  approximations 
can  be  obtained  that  have  very  few  terms  but  are  still  accurate  in  one  sense 
or  another.  Two  examples  follow. 

For  a  first  example  suppose  it  is  desirable  to  keep  the  total  squared 

error 

^  ^  2 

E  [Prob(u  =  u^)  -  approximated  Prob(u  =  u^)] 

small.  A  basis  can  be  obtained  by  analyzing  the  function  of  two  variables 

n 

H(s,t)  -  n  (P  (s)P.(t)  +  [1-P  (s)][l-P  (t)]} 
i=l  x 

defined  for  c<s,  t<d  .  By  solving  the  functional  equation 

H(s,t)h(s)ds  =  Ah(t) 

one  obtains  a  maximal  set  of  orthonormal  functions  h.  in  the  CS  and 

J 


positive  constants  Aq>A^>...Aj>  0  such  that 


H(s , t)  =  E  A.h. (s)h. (t)  . 
j=0  J  J  J 

These  functions  can  be  shown  to  form  an  orthonormal  basis  for  the  CS 
(Levine  1989) .  From  this  fact  and  the  identity 

2n  #  * 

H(s,t)  =  E  ^(u  ,s)i(u  ,t) 
v=l 

it  follows  that  the  total  squared  error  with  a  K<J  term  aj  hnation 


satisfies 


S  [Prob(u  -  u*)  -  2  Jr  <Z  h  >]2  -  2  A  <g,h  >2 

j<K  J  J  j>K  J  J 

< -ak  4  g2(t)dt 

when  ic.  -  <g,h.>  .  ,  , 

3  3 

This  relation  is  important  because  for  all  tests  we  have  analyzed,  the 

Aj  very  rapidly  decrease  to  zero.  Typically,  K-15  provides  very  accurate 

least  squares  pattern  probability  approximation. 

To  introduce  a  second  and  final  example  concerning  the  choice  of  a  CS 

basis,  suppose  it  is  important  to  control  the  maximum  absolute  error.  In 

addition  suppose  an  approximation  of  the  distribution  function  G  is 

available.  Then  it  is  often  possible  to  select  an  orthonormal  basis  h^ 

such  that  E[h.(0)]  will  be  small  for  large  j  .  Of  course,  small 

J  » 

E[h.(0)]  guarantees  that  the  finite  sum  2  |E(h.(0))|  is  also  small. 

J  j>K  J 

This  is  important  because  for  ic ^  -  <g,tu>  -  E[h^(5)]  the  K  term 

approximation  satisfies 

'fc 

|Prob(u  -  u  )  -  2  n.<Z  ,h.>|  -  2  IttX^  , h . > I 

*  j<K  J  ^  J  '  j>K  J  "  J 

<  {  2  I  IT .  |  ) 1/2  {  2  |JT,  i  <1  ,h  >2}1/2 

j>K  J  j>K  J  J 


<  {  2  |tt.  |  }1/2  Max 
j^K  J  j>K 


l*J1/2  (2  <*  ,h.>2  }  1/2 


<  {  2  |*.|}1/2  MaxjTr.  |1/2(  Jd  *2(t)dt}1/2 


<Maxl7r.|1/2  {  2  1 7T .  | } 1/2  (d-c)1/2  Max  i  (t) 
j>K  J  j>K  J  c<t<d  V 


This  paper  is  concerned  with  using  maximum  likelihood  estimation  of  the 
ji\,  from  sampled  patterns  to  obtain  approximations  of  pattern  probabilities. 
It  will  be  shown  that  the  maximum  likelihood  estimate  for  the  vector  ic  is 


strongly  consistent  and  asymptotically  efficient.  In  addition  some  results 
about  the  set  of  vectors  ic  corresponding  to  distributions  are  proven. 


page  10 


Section  Two 

Uniqueness  and  Consistency  of  the  Maximum  Likelihood  Estimate 

In  the  remainder  of  this  paper  F  is  used  to  denote  the  unknown 
distribution  of  the  ability  random  variable  6  ,  {lu is  a  fixed 
orthonormal  basis  for  the  CS,  and  ?r  is  the  vector  of  expectations  with  jth 
coordinate  E[hj(0)]  .  Thus  the  pattern  probability  for  u  can  be 

written 

P  (U*)  -  E  <f(u*,*),h.>  7T. 

*  J  J 

or  with  the  abbreviation  .  (u*)  -  <i(u*,*),h.>  ,  P  (u*)  -  /?(u*)»jt  .  If  G 

J  J  * 

is  the  distribution  of  0  or  any  other  distribution  on  [c,d]  and  g^,(*)  - 
Eirjhj(*)  is  its  quasidensity  then  the  pattern  probabilities  obtained  by 
using  G  in  place  of  F  are  given  by 

-  jf  -e(u*,t)  dG(t) 

-  /3(u*) •*' . 

This  section  begins  the  task  of  estimating  F's  quasidensity,  or 

•ff 

equivalently  the  vector  n  ,  from  a  sample  of  patterns  u 

Section  II. 1:  Distributions  Viewed  as  Points  in  a  Convex,  Compact  Subset 
of  Euclidean  Space 

Procedures  for  recovering  the  unknown  from  samples  of  observed 

patterns  have  been  developed.  Each  requires  the  maximization  of  some 
continuous  function  defined  over  a  set  A  of  vectors  corresponding  to 
quasidensities 

A  -  { 7r *  in  EJ+i  :  S  xjh.(*)  is  the  quasidensity  of  at  least  one 
distribution  on  [c,d]  }  . 


The  set  A  has  three  properties  that  greatly  simplify  maximization: 


i.  Convexity:  the  line  segment  connecting  any  two  points  in 
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A  is  also  in  A  . 

ii.  Boundedness:  There  is  a  constant  k  such  that  for  all  n1 
in  A  1 7T '  - 7T  |  =  <7r' -ir.jr' <  k  . 

iii.  Closure:  Every  continuous,  bounded  function  defined  on  A 
has  a  maximizer  in  A  . 

The  proof  requires  a  simple  result  that  is  repeatedly  used  elsewhere. 

it  2n 

II. 1:  The  vectors  {/3(u^))^  span  a  J+l  dimensional  vector  space. 

Equivalently ,  for  any  choice  of  positive  constants  the  matrix 


ft  ft  T 

2  £(u  )/3(u  r  w 

r  u  r  V  V 

v=\ 


is  positive  definite. 


Proof :  Let  2  ,2  ...2  be  likelihood  functions  forming  a  basis  for 

Ul  v2  "j+l 

ft 

the  CS.  Since  2  (•)  “  2.  /J.(u  )  h.(»)  .  linear  independence  of  the  2 

Ui  ^  J  ^i  ^  V i 

*k  ★  2n 

implies  linear  independence  of  the  /?(u^  )  •  Thus  the  set  {/3(u  of 

i 

J+l  vectors  contains  J+l  linearly  independent  vectors.  // 

II. 2:  The  set  A  of  coordinates  of  distributions  on  [c,d]  is  convex. 


closed  and  bounded. 


Proof :  (i)  Convexity:  For  w1  and  tt2  in  A  and  0  <  e  <  1  let  and 

G0  be  distributions  on  [c,d]  with  quasidensities  g  .  and  g  2 
respectively.  Since  for  any  positive  e  ,  G^  =  eG^  +  (1-£)G2  is  also  a 
distribution  on  [c,d]  and  since  for  any  h  in  the  CS 
if  h(t)  d(£G1(t)+(l-£)G2(t)]  =  c  /Ji(t)dG1(t)  +  (l-£)  h(t)dG2(t)  ,  it 
follows  from  1.1  that  S£7ri+^  e)^2  i-s  t^ie  quasidensity  of  G^  .  Thus  a 
convex  combination  of  points  in  A  is  in  A  . 
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(ii)  Boundedness:  Let  n'  be  any  vector  in  A  other  than  n  .  Let,  it' 

be  the  intersection  of  the  unit  sphere  about  it  and  the  ray  from  n 

passing  through  it'  .  Thus  \ir'  -  ;r|  *»  1  and  it'  ■*  [it'  -  7r]/k  +  it  ,  (or 

equivalently,  it'  -  it  +  k(?r'  -  it)  )  for  k=  | tt-tt '  | >0  .  ‘  Since  7r'*/?(u^)  = 
ic  •k 

P  .  (u  )  and  tt‘B( u  )  =  P  (u  )  are  probabilities, 

7T  7T 

0  <  P  ,(u*)  =  [*  +  k(*'  -  ff)]«/9(u*)  <  1 

7T 

and 

-1  <  -P  (u*)  <  k(w'  -  7r)T/3(u*)  <  1-P  (u*)  <  1  . 

7T  7T 

After  squaring  and  summing  over  all  2n  patterns  we  obtain 

0  <  k2(*'  -  *)T  2*  £(u%3(u*)T  (*'  -  it)  <  2n  . 
u 

*  *  T 

But  in  II.  1,  2^  /?(u  )/3(u  )  was  shown  to  be  positive  definite.  Since 

u  2 

| it'  -  -n- 1  -  1  ,  the  expression  multiplying  k  must  be  at  least  as  large  as 

the  smallest  eigenvalue  of  this  matrix.  Thus  for 
2  n 

k  -  2  /[smallest  eigenvalue],  |tt'  -  ir\  <  k  ,  and  A  is  bounded. 


(iii)  Closure:  Let  {w11}  be  a  Cauchy  sequence  in  A  .  Since  A  is 

J+l 

bounded,  the  sequence  converges  to  some  it1  in  E  .To  show  tt'  is  in 


A  ,  let  {G  }  be  any  sequence  of  distributions  on  [c,d]  such  that  g 

7T 

is  the  quasidensity  of  G^  .  By  Helly's  theorem,  (G^)  contains  a 


subsequence 


^mCn)1 


such  that  converges  to  some  distribution  G 


on  [c,d]  at  every  point  of  continuity  of  G  .  Since  each  h^  is 

continuous,  h.(t)  dG  ,  .  (t)  ■*  /  h,(t)  dG(t)  .  Thus,  by  1.2  -* 

Jc  j  m(n)  Jc  y  '  j  j 

h.(t)  dG(t)  .  Since  { 7rm^n^ }  is  a  subsequence  of  {^n}  ,  -♦ 

c  J  J 

f  h.(t)  dG(t)  =  it'.  ,  and  g  ,  is  the  quasidensity  of  G  .  // 
c  j  J  it 


Note  that  since  the  CS  is  also  a  metric  space  with  distance  (g,h)  = 
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1/2 

<g-h,g-h>  '  and  since  n'  -»  g  is  an  isometry,  it  follows  that  the 
subset  of  the  CS  corresponding  to  quasidensities  of  distributions  on  [c,d] 
is  also  convex  and  compact. 


Section  II. 2:  Uniqueness  of  the  Maximum  Likelihood  Estimate 

Consider  now  drawing  a  random  sample  of  N  patterns 

*  *  *  * 

U1’  V  •••  V  •••  UN 

and  attempting  to  recover  the  quasidensity  g  by  maximizing  the  sample 

^  n 
* 

likelihood  function  IT  P  , (u  )  defined  for  vectors  it'  in  A  or  its 

,  n'  a 
a=l 

logarithm 

N 

I^(ir')  -  S  log  ff'*/3(ua)  . 
a=l 

It  will  be  shown  that  if  the  sample  is  large  enough  that  almost  surely 

has  a  unique  maximum  . 

II. 3:  With  probability  one  LN(*)  eventually  has  a  unique  maximizer  in 
A  . 

#  *  * 
Proof:  For  each  pattern  u  and  vector  n’  in  A  ,  P  ,  (u  )  =  7r'*/3(u  ) 

is  positive.  Therefore,  L^( • )  is  defined  and  continuous  on  A  .  Since  A 

is  compact,  L^(*)  has  at  least  one  maximizer  in  A  .  Suppose  both 

maximize  A  .  Since  the  line  segment  connecting  two  points  of  A  is 

entirely  in  A  ,  a  function  of  one  variable  is  defined  by 

p(0  =  +  «(jt'  -tt")) 

for  0<e<l  .  From  the  formula  for  and  the  fact  the  n'  and  n"  are 

maximizers,  p  has  2  continuous  derivatives,  and  p(0)  -  p(l)  .  Since  the 
second  derivative  of  p  is  the  negative  of  a  sum  of  squares 

N  *  2 

P"(0  =  -S  W  (O[0r'-7r")  •  P(V)) 

13  3 
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for  w  (O  =  {P  ,  (u*)  +  f[P  „(u*)-P  ,  (u  )]•}  ^  >0  and  max{p(0)  =  p(0)  = 
p(l)  ,  p  must  be  constant.  Thus,  the  second  derivative  of  p  evaluated 
at,  say,  e=.5  must  be  zero.  However, 

N 

0  =  -2  w  ( .5)  (tt'  -jr")T/3(u*)/?(u*)T(7r'  -tt") 

-  a.  ad 

a«l 

"At 

implies  that  the  N  vectors  j3(u  )  span  a  subspace  of  dimension  less  than 
J+l  .  Since  each  of  the  2n  response  patterns  has  positive  probability  of 
being  sampled  and  since  by  II. 1  the  full  set  spans  a  space  of  dimension 
J+l  ,  with  probability  one  eventually  a  linearly  independent  set  of  J+l 
patterns  will  be  sampled,  and  the  maximizer  will  henceforth  be  unique.  // 

Section  II. 3:  Strong  Consistency  of  the  Maximum  Likelihood  Estimate 

In  order  to  study  the  asymptotic  behavior  of  ,  the  maximum 

likelihood  estimate,  it  is  convenient  to  have  an  open  set  containing  A  on 

which  any  1^  can  be  extended  to  e  differentiable  function.  To  this  end  we 

choose  a  positive  number  d  such  that  if  a  vector  x  is  within  distance  d 

★  * 

of  at  least  one  point  of  A  ,  then  x*/3(u  )  ^  0  for  all  patterns  u 

II. 4:  Let  d  =  inf  UNION  {|x-tt'|  :  n'  is  in  A  and  x*/?(u*)=0}  and 

* 

u 

A+  =  {x  :  for  some  7r'  in  A  ,  |x-7r'  | <d }  . 

Then  A+  is  an  open  set  containing  A  on  which  the  formula 

N 

Z  log  x*/3(u  ) 
a=l  3 

extends  to  a  differentiable  function  defined  on  A+  . 

Proof:  It  remains  only  to  show  d  is  positive.  Since  A  is  compact,  for 
each  u  the  set  (  |x-tt'  |  :  x*/3(u  )  =  0  and  tt'  is  in  A)  has  a  positive 
minimum.  Thus  d  is  the  minimum  of  2n  positive  numbers.  // 
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If  there  is  some  tt'  in  A  such  that  P^,(»)  assigns  exactly  the  same 
probabilities  to  patterns  as  P  (•)  then  it  will  not  be  possible  to  prove 
converges  to  it  .  Thus  the  following  intrinsically  important  result  is 
needed. 

* 

II. 5:  If  and  both  are  in  A  ,  then  for  at  least  one  pattern  u  , 

P  (u*)  *  P  „(u*)  . 

7T  7T 

Proof:  If  P  ,(•)  ■=  P  „(*)  ,  then  for  all  u* 

7T  7T 

0  -  /3(u  )  •  (tt*  -ir"  )  . 

Since  from  II. 1  the  /?(u  )  span  a  J+l  dimensional  space,  the  J+i  vector 
rr'  must  equal  tt"  .  // 

Finally,  strong  consistency  for  the  maximum  likelihood  estimate  can  be 
proven  by  an  argument  Wald  used  to  prove  the  consistency  in  a  different 
context  (Wald,  1949). 

II. 6:  (Strong  consistency  of  the  mle)  With  probability  1  ,  7rN  converges  to 

7T  . 


Proof:  Using  the  inequality  log  x  <  x-1  for  x*l  and  II. 5  it  follows  that 
for  7rV7r  ,  E  log  Pff,  (u)  <  E  log  P^(u)  .  For  if  the  set 
D  ■=>  {  u  :  P  ,  (u  )i*P  (u  ) )  is  not  empty  then 


E  [log  Pfff (u)]  -  Eflog  Pw(tt)] 


-  E  log[Pw(  (u)  /  P^u)] 

-  S  *  P(u*)  log  [P  (u*)/P  (u*)] 

u*  in  D  n  n  * 

<2*  P  (u*)  [P  (u*)/P  (u*)-l) 

u  xn  D 


2  P (u*)  -  2  P  (u*) 
D  D 
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=  £  P  , (u*)  -  [1  -  S  (u*)] 

D  u  not  in  D  77 

=  0 

•k 

To  obtain  a  finite  open  covering  of  some  subsets  of  A  ,  note  that  x*/3(u  ) 

+  ■jlf 

is  positive  for  x  in  A  For  mVfl-  let  s(u  ,n' ,p)  »  sup  {7r"‘/3(u  ):  tt" 
is  in  A+  and  1 7r '  - 7r "  |  <  p)  .  Since  for  n"  in  A  ,  n"  in  A+  and 
|*r'  -Jr"  |  <  p 

tt"*/9(u*)  -  [tt'  +  )  ]  */3(u*) 

<  Pw,(u*)  +  |ir"-w'||0(u*)| 

<  Pff,(u*)  +  p|/J(u*)|  , 

^  ^  ^  ^ 

log  s(u  ,7r',p)  <  log  P^,(u  )  +  p|)3(u  )J  /P  (u  ).  Consequently 

E  log  s(u,7r',p)  <  E  log  Pff,(u)  +  pE[|j9(u)|  /P^u)]  , 

-  E  log  P^  (u) 

-  (  E  log  P  (u)  -  E  log  P  , (u)  ] 

71  7T 

+  pE{ |/?(u) |  /?n,(n)] 

and  for  each  t rVir  in  A  ,  a  positive  p( n')  less  than  d  can  be  selected 
so  that 


E  log  s(u,7T' ,p(7T'  )  )  <  E  log  P^(u)  . 

Let  B  be  a  closed  subset  of  A  not  containing  7r  .  To  show  that  with 
probability  one 


N  *  * 
sup  n  p  ,(u  )/p  (u  ) 

w"  in  B  a-1 

tends  to  zero  as  N  tends  to  infinity,  consider  the  open  covering  of  A 
formed  by  the  sets 


B(tt')  =  { 7r 11  in  A+  :  |  <  p(n'))  . 
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Since  B  is  also  compact  n'  tx'  ...  tx'  can  be  selected  such  that  B(7r'), 

l  L  m  i 

B(jrp,  ...B(7r^)  covers  B  .  By  the  strong  law  of  large  numbers, 

1  N  *  * 

-  2  {log  s(ua,jr!,  ,p(jrp)  -  log 

a-1 

almost  surely  tends  to  E  log  s(u,7r^,p(7rp)  -  E  log  P^(u)  <  0  . 
Consequently  with  probability  one, 


*  * 

2  log  s(u  .ir'p(ir'))  /P  (u  )  -*  -® 

Id  1  1  7l  d 


and  lim  sup  II  7r"*/3(u  )/P  (u  )  -  0  .  Since 

N  tx "  in  B  a**l  a  n  a 

x 

N  * 

o  <  sup  n  p  „(u  ) 
ir"  in  B  a=l 

m  N  + 

<  2  sup  II  tx"*P(\i  )  , 

i«»l  ix "  in  B.  a*=l 
l 

with  probability  one 


lim 

N 


Finally,  to  show 


N  *  N 

sup  n  P  „(u  )/  n 

tt"  in  B  a=l  a=l 

converges  almost  surely  to 


* 

P  (u  ) 

tx  a 


tx  ,  it  is  shown  that  for 


any  positive 


with  probability  one,  |jr^-ir|  is  eventually  less  than 


e  . 


Let  B  be  the  closed  set  excluding  tx  , 


B={7r'inA:  |  tx'  -  tx  |  >  e  )  . 

With  probability  one 


N 

sup  n  p  , 
tx'  in  B  a=i  w 

is  less  than  one  for  sufficiently  large 
is  eventually  in  the  complement  of  B  , 
than  £  .  // 


N 

(V  /  n 


i< 

P  (u  ) 
tx  a 


a=i 

N  .  Thus  with  probability  one 
i.e.,  | 7tn-7t |  is  eventually  less 


A 

7T 


N 


Section  III 

A  Dichotomy  for  Ability  Distributions 

A  common  starting  point  for  studying  the  asymptotic  distribution  of 
maximum  likelihood  estimates  is  the  family  of  likelihood  equations 

°-  srW  i-o.i,  ...j. 

J 

It  turns  out  that  these  equations  are  false  for  our  current  formulation  of 
the  estimation  problem. 

The  equations  are  valid  if  is  the  maximum  of  over  an  open 

J+l  dimensional  subset  of  A  .  But  A  has  no  J+l  dimensional  subsets 
because,  as  shown  below,  it  is  a  J  dimensional  subset  of  a  J+l 
dimensional  space. 

In  this  section  A  is  reparameterized  as  a  J  dimensional  set,  i.e., 
the  points  of  A  are  expressed  as  functions  of  J  numbers.  The 
reparameterization  suggests  a  dichotomy  of  the  distributions  on  [c,d]  .  We 
distinguish  "regular"  distributions  that  correspond  to  points  in  the 
interior  of  the  new  set  of  parameters  for  A  and  "irregular"  distributions 
that  correspond  to  boundary  points.  The  distinction  is  important  because  in 
this  paper  the  asymptotic  distribution  of  the  mle  is  worked  out  in  detail 
only  for  regular  distributions. 

In  the  process  of  attempting  to  show  that  the  irregular  distributions 
were  pathological  and  safe  to  ignore  a  surprising  result  was  obtained.  It 
was  found  that  for  the  most  popular  item  response  models,  every  distribution 
is  equivalent  to  some  discrete  probability  distribution  on  at  most  J+l 
points . 
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Section  III.l:  Reparameterization 

.  *  * 

Since  1  =  2^?  ,  (u  )  =  ix‘  *2  ^  /3(u  )  every  vector  n'  in  A 

u  u 

—  - 

satisfies  the  equation  ■»  1  for  /3  =>  2  ^  /3(u  )  .  Thus  A  is 

u 

J+l 

contained  in  a  J  dimensional  subset  of  E  and  no  point  of  A  has  an 

open  neighborhood  contained  in  A  . 

““12  J 

To  reparameterize  A  let  {/3/|/3|,z  ,z  ,  ...  z  }  be  an  orthonormal 
basis  for  and  Z  be  the  (J+l)xJ  matrix 

Z  =  [z’-.z2,  ...  z J )  . 

T 

Thus  x  -*  Z  x  maps  the  set  of  vectors  orthogonal  to  one-to-one  onto 
J  T 

E  .  Since  x  -*  Z  x  is  also  linear,  it  follows  that  for  any  fixed  n°  in 
A 

XX 1  -  ZT(tt'  -7T°) 

maps  A  one-to-one,  onto  some  convex,  compact  subset  of  EJ  and  that  each 
ix 1  in  A  can  be  expressed  as 

ix1  =  tx°  +  Zs 

for  exactly  one  J  vector  s  in  a  convex,  compact  subset  of  E^  . 

T 

Let  B  =  (Z  (7r '  - 7T° )  :  7r '  e A  }  be  the  set  of  possible  J  vectors 
s  .  It  will  be  shown  that  B  is  J  dimensional  so  that  distributions  on 
[c,d]  can  be  classified  into  two  non-empt^  „ets:  regular  distributions 
with  s  in  the  interior  of  B  and  irregular  distributions  corresponding  to 
vectors  on  the  boundary  of  B  .  Before  resuming  the  study  of  the 
distribution  of  the  maximum  likelihood  estimate  some  facts  about  regular  and 
irregular  distributions  will  be  proven. 
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Section  III. 2:  Regular  and  Irregular  Distributions 

A  distribution  on  [c,d]  ,  its  quasidensity  g^(  ,  and  its  coordinate 
vector  n'  will  be  called  regular  if  for  some  p>0  the  J-dimensional  open 
ball  centered  at  n' 

{tt"  in  =■  1  and  |7r"-  7r'|<p) 

is  a  subset  of  A  .  If  a  distribution  is  not  regular,  then  it  (and  its 
quasidensity  and  coordinate  vector)  will  be  called  irregular.  Equivalently, 
a  distribution  with  quasidensity  g  ,  is  regular  if  and  only  if  Z  (tt'-tt0) 

7T 

is  an  interior  point  of  B  . 

The  uniform  distribution  and  most  distributions  expected  in 
applications  are  regular.  However,  if  the  pattern  likelihood  function 

•ff 

i(u  ,t)  is  unimodal  with  maximum  at  t^  in  [c,d]  ,  then  the  unit  step 
function  at  t^  is  an  irregular  distribution  on  [c,d]  .  The  first  result 
shows  that  some  distributions  are  regular,  i.e.,  that  B  is  J 
dimensional. 


III.l  If  G  is  a  distribution  on  [c,d]  with  a  positive  quasidens i.tv  and 
if  for  c<t<d  the  quasidensitv  of  G  evaluated  at  t  equals  the 
derivative  of  G  at  t  then  G  is  regular.  In  particular,  the 
uniform  distribution  is  regular. 


Proof:  If  —  G(t)  =  S  n'.  h.(t)  =  g^,(t)  >  0  for  c<t<d  ,  and  g^  is 


0 


J  J 


also  positive  at  c  and  d  ,  then  min  g  ,(t)  >  0 


If 


=  max  7tv- 
1  co  .  1  1 

J 


ml |  is  sufficiently  small,  then  |g^„  „,(t)|  ^  1/2  min  g  ,(t) 
J  r.  -r.  t  n 

Consequently  for  sufficiently  small  1 7r"  -7r '  | 


v(c> 


>  min  -  >rax|g>,_jt„(t)| 
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Since  |  | ^  and  |  |w  determine  the  same  topology  on  ,  for  some 

P0> 0  ,  | jr" -ir'  1 2  <  Pq  implies  g^„(t)>0  for  t  in  [c,d]  . 

To  show  1 7r" -7r'  | <Pq  and  ?r"*/3-l  imply  n"  is  in  A  it  suffices  to  show 
that  these  assumptions  imply  that  the  function  G 


0  if  c  <  x 

G(x)  -  •  g5T„(t)dt  if  c  <  x  <  d 
1  if  d  <  x 


is  a  distribution  on  [c,d]  and  g^„  is  its  quasidensity.  The  only 

nontrivial  step  in  verifying  that  G  is  a  distribution  on  [c.,d]  is 

showing  G(d)-1  .  Since  for  all  t  ,  2  .  f(u  ,t)  -  1  ,  G(d)  «•  J  g  „(t)dt  - 
d  ★  *  U 

f  s  X(u  ,t)g  (t)dt  =  2  <2  X(u  ,»)lh.>ir2  =  /?•*"  “  1  .  Consequently,  G 

U  j  U  J  J 

is  a  distribution  on  [c,d]  ,  g^„  is  the  restriction  of  its  probability 

density  to  [c,d]  ,  g^„  is  its  quasidensity  (by  the  uniqueness  of 

quasidensities  in  I . 1) ,  and  tt"  is  in  A  .  In  particular,  since 

*  -1 
1  “  2  *  i(u  ,t)  is  in  the  CS,  the  uniform  density  g(t)  -  [d-c]  is  in 

u 

the  CS,  and  the  uniform  distribution  is  regular.  // 

The  next  result  gives  examples  of  regular  distributions  that  do  not 
have  continuous  densities.  It  shows  how  to  approximate  any  distribution  on 
[c,d]  with  a  regular  distribution. 

III. 2  If  tQ<t^...  <tj  are  in  (c,d)  and  the  vectors  =  ^u(t^)j  are 
linearly  independent  then  for  any  vector  a  in  E"*+^  if 


a.  >  0  1=0,  ...J 

J  J 


2. Or.  =  1 

J  J 

then  the  discrete  distribution  function 


{i:  t^t) 
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is  regular.  More  generally  if  Gq,  G^,  ...  G^  ...  G^  are 
distributions  on  [c,d]  with  quasidensities  g  .  and  the  n1  are 

7T 

linearly  independent,  then  a>0  and  E  a.=l  imply  that  E^ct^G^  is  a 
regular  distribution  on  [c,d]  . 

Proof :  Clearly  each  for  vector  a>0  such  that  E  a.,=l  the  convex 

combination  S.a.  G.(*)  of  distributions  G.  on  (c.d)  is  a  distribution 
111  1 

on  [c,d]  .  Let  a°>0  be  the  vector  of  coefficients  of  any  such 
combination.  A  value  of  p  will  be  computed  to  show  E^  a?  tt1  is  regular. 
For  any  a 

.  „  .  .T 

IE.  a?  tt1-  E.  a.  7T1  I  -  S  (a?  -  a.)  (a?  -  a.) 

'll  11  1  ..l  l  1  J 

LJ 

=  (a0  -  a)"^  Q(a°  -  a) 

for  positive  definite  or  positive  semidefinite  Q  .  Since  the  ■nX  are 
linearly  independent,  Q  is  definite.  Consequently 

|E^  a?  a^  7r^|  >  |a°  -  ,  where  e>0  is  the  smallest  eigenvalue 

i  J+l 

of  Q  .  Since  the  J+l  linearly  independent  7r  form  a  basis  for  E  , 

and  for  any  7r '  in  EJ+\  there  is  a  unique  a'  in  E^+^  such  that  n'  ° 

S  a'.*1  .  If 

l 

|S.  a?  7T°  -  n'  I  <  ^  min(a?) 

'll  '2  .1 

l 

then 

|a°  -  a' |  <  ^  min{a°)  , 

maxja?  -  a '.I  <  ^  min  (a?)  . 

‘  l  i‘  2  i  ’ 

and 


0  <  a',  for  i  =  0,  1,  ...  J  . 

l 

Furthermore,  if  7r'*/8  -  1  ,  then 
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1  -  <S  o'  ttVjS 
i 

-  S  a!(*'.j9) 
i 

-So!. 

1 

Thus  for  P  “  \  min(a°)  , 

|E  x~  -  it'  |  <  p  and  *•'  »/9  —  1 

imply  g  ,  is  the  quasidensity  of  some  distribution  on  [c,d]  ,  i.e.  it'  is 
in  A  and  E  a°  is  regular.  // 

The  following  result  helps  to  visualize  irregular  distributions. 

111. 3  I£  it'  is  regular  and  jr"  is  in  A  then  for  sufficiently  small 
positive  c  ,  jt"  4-  (l+«)  (it' -*")  is  also  in  A  . 

Thus  to  obtain  an  irregular  point  one  starts  with  any  two  points  of  A  ,  it0 
and  it'r*x°  .  Since  A  is  convex,  closed  and  bounded  one  can  move  through 
A  along  the  ray  from  it0  through  *'  to  a  point  ir°  +  k(7r'-»r0)  in  A 
such  that  ir°  +  k'(jr'-jr°)  is  not  in  A  for  any  k'>k  .  Thus 
w°  +  k(jr'-7r°)  can  not  be  regular. 

Proof:  {}•  [jt"  +  (1+c) (*'-*")}  -  1  +  (l+c)(l-l)  -  1  and,  for  sufficiently 
small  positive  «  ,  |w'  -  { jr"+{l+e )  (x' -*") )  |  -  c  |  jt' -7t"  |  <  p  .  // 

Irregular  distributions  can  be  obtained  from  the  many  upimodal 
functions  (e.g.  most  likelihood  functions)  in  the  CS  too. 

111. 4  If  f  is  a  function  in  the  CS  with  a  unique  maximizer  t^  ,  i.e..  if 

f(t)  2:  implies  t  -  t^.  for  all  t  in  [c,d]  ,  then  the  unit 

step  function  at  tg  is  irregular. 

Proof :  Since  f  is  a  linear  combination  of  likelihood  functions  for  any 

J 

distribution  function  on  [c,d]  ,  J*  f(t)  dG(t)  <f,g>  where  g  is  the 
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quasidensity  of  G  .  In  particular  for  ^  =  the  quasidensity  of  the 

unit  step  at  tn  ,  <f,g  . >  =  f(t_)  .  Note  that  if  G  is  the 

r  0  °n(tQ)  0 

distribution  of  a  random  variable  <o  then  <f,g>  =  E[f(w)]  .  Thus  for  every 
quasidensity  tt '  in  A  ,  <f,g  ,>  <  max  f(t)  =  f(t„)  .  In  particular  for 

7T  U 

g?r(t  )  t*ie  quasidensity  t^ie  unit  step  at  t^t^  > 

f(ti>  <  £<V  -  <E'g,(t0)>  • 

Consequently  for  any  fe>0  ,  for  ?r(f)  =  tt  ( t^>  +  (1+e)  [  7r  ( tQ ) -tt^)  ] 

<£'g.«)>  '  <£'g*(t0)>  +  et<f'g„O:0>>-<£'Mt1>>> 

-f(tQ)  +  6[f(t0)-f(t1)]  >  f(tQ)  . 

Thus  cannot  be  the  quasidensity  of  any  distribution  on  [c,d]  and 

from  III. 3,  the  unit  step  at  t^  is  not  regular.  // 

In  fact  a  stronger  result  can  be  proven.  It  can  be  shown  that  the  unit 
step  at  a  maximizer  corresponds  to  a  point  n'  in  A  situated  like  a 
vertex  of  a  polyhedron  or  a  boundary  point  of  an  ellipsoid:  If  e>0  and 
tt"  is  also  in  A  ,  then  n'  +  (1+0  (7r"-7r' )  is  not  in  A  .  In  other  words, 
tt '  is  not  an  interior  point  of  the  intersection  of  A  and  any  line  through 


III. 5  X£  f  is  a  function  in  the  CS  with  unique  maximizer  t^  ,  then 

7r(tQ)  is  not  an  interior  point  of  the  intersection  of  A  and  any 
line  through  n'  . 

Proof:  Let  G  be  any  distribution  on  [c,d]  other  than  the  unit  step  at 

tg  .  Let  [c',d']  be  any  closed  subinterval  of  [c,d]  not  containing  t^ 
'd r 

such  that  J  ,  dG(t)  >  0  .  Then  for  g  ,  equal  to  the  quasidensity  of  G  , 

C  7T 


<f,g  f>  =  f  ,  f(t)  dG(t)  +  f  .  r  ,  W1 

&7T '  Jc'  v  v  '  Jt  not  in  [c  ,d  ] 


f(t)  dG(t) 
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<  max  f(t)  jf,  dG (t)  +  f(tn)  [  1  -  J?  dG(t) 

c'<t<d'  C  U  L 

<  f(tQ)  Sdc'.  d<3(t)  +  f(tQ)  [  1  -  4  dG(t)  ] 

“f<V  "  <flg?r(t0)>  * 

From  the  argument  in  III. 4  applied  to  jt(c)  =  w'  +  (1+0  [ 7r ( t^) - tt '  ]  it 
follows  that  for  all  n'  *  ’’■(tg)  in  A  for  £>0  ,  n(e)  is  not  in  A  .  // 

Not  every  irregular  distribution  is  mapped  to  one  of  these  points  that 
"stick  out"  from  A  like  a  vertex  or  a  point  of  positive  Gaussian  curvature 
on  a  boundary.  By  taking  linear  combinations  of  unimodal  functions  in  the 
CS  one  can  construct  bimodal  and  multimodal  functions  with  modes  of  equal 
height.  The  reasoning  used  in  III. 4  and  III. 5  can  then  be  used  to  show 
that  A  has  edges  and  faces  too. 

III. 6  Let  f  be  a  function  in  the  CS  having  modes  of  equal  height  at 

tQ  <  <  ...  tm  so  that  f(tQ)*=f(t1)=  ...  f(tm)  and  for  t  in 

[c,d]\{ tQI  tr  ...  tj  ,  f(t)  <  f(tQ)  .  Let.  Gq ,  G1,  ...  Gm  denote 

the  unit  step  functions  at  t^ ,  t^ ,  ...  t^  .  Then  for  any  positive 

m 

numbers  a.  such  that  2  a.  =1  .  the  distribution  2  a.G.  is 

-  i  -  _  i  -  i  l 

0 

irregular. 


Proof:  Let  g  .  .  be  the  quasidensity  of  the  unit  step  function  at  s  in 
,r^s'  m 

[c,d]\{t0,t^,  ...  tm)  .  Then  <f,g^^>=  f(s)  <  f(tQ)  =  S  ».f(t.)  .  Thus 

for  g  .  .  equal  to  the  quasidensity  of  the  unit  step  at  t.  and 

7T(C.  ^  1 


;r(€)  -  7r(s)  +  (l+e)[2  a^7r(t^)  -  7r(s)] 

€>0  implies  <f,g^^>  «  f(tp)  +  6  [f(tg)-f(s)  ]  >  fCt^)  •  Thus  2  a-G^  is 
irregular.  // 


The  following  corollary  reconciles  the  apparent  contradiction  between 
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III. 2  and  III. 6  . 

111. 7  If  for  f  in  the  CS  there  are  more  than  J  numbers  t  satisfying 
f(t)>f(s)  for  all  s  in  [c,d]  then  the  auasidensities  of  the  unit 
step  distributions  at  these  numbers  are  linearly  dependent. 

Two  ability  distributions  are  called  equivalent  if  the  probability 
distribution  of  any  function  of  the  item  scores  does  not  depend  on  which  of 
the  two  ability  distributions  is  used  to  compute  the  distribution.  Thus,  if 
F  and  G  are  equivalent,  then  data  cannot  be  used  to  determine  which  of 
the  two  distributions  is  correct.  A  necessary  and  sufficient  condition  for 
two  distributions  on  [c,d]  to  be  equivalent  is  that  they  have  the  same 
quasidensity  (Levine,  1989).  We  will  show  that  for  logistic  models  all 
distributions  are  equivalent  to  discrete  distributions.  The  result  is  valid 
for  models  with  item  response  functions  such  that  for  each  t  there  is 

a  power  series  for  that  converges  absolutely  in  some  neighborhood  of 

t  . 

111. 8  If  the  constant  functions  are  the  only  functions  in  the  CS  that  are 
constant  on  some  nonempty  open  subset  of  [c,d]  ,  then  every 
distribution  on  [c,d]  is  equivalent  to  a  distribution  with  at  most 
J+l  points  of  increase. 

Proof:  It  is  sufficient  to  prove  that  there  are  finitely  many  points  of 
increase  because  A  has  been  shown  to  be  J  dimensional.  It  is  sufficient 
to  limit  attention  to  irregular  distributions  because  if  n'  is  not  on  the 
boundary  3A  of  A  then  compactness  of  A  implies  that  for  any  tt1  in 
3A  we  can  choose  t>l  such  that  n2°n1+t(n1 -k' )  is  also  on  3A  .  The 
equation 
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shows  that  n'  corresponds  to  a  probability  mixture  of  distributions  mapped 
to  the  boundary  of  A  . 

Let  G  be  a  distribution  on  [c,d]  with  quasidensity  g  ,  for  tc'  on 

dk  .  Let  H=(x  in  :  n*x=c}  be  a  hyperplane  in  containing 

tt'  and  no  points  of  the  interior  of  A  .  Without  loss  of  generality  it  can 

be  assumed  that  x  in  A  implies  n«x<c  because  n*x=c  implies 

(-n)*x=»-c  .  It  follows  that  among  distributions  on  [c,d]  G  is  a 

maximizer  of  S.n.h.(t)  dG(t)  =  n*7r'  .  Either  g  (•)  «=  E  n.h.(»)  is 

constant  on  some  subinterval  of  [ c , d]  (and  therefore  constant)  or  there 

are  only  finitely  many  numbers  t  such  that  gn(t)>gn(s)  for  all  s  in 

[c,d]  .  gn  cannot  be  constant  for  otherwise  n*ir"  -  gn(t)  »g  , (t)  dt  -  c 

would  be  independent  of  jt"  and  A  would  have  no  interior  points.  Thus 

for  finitely  many  numbers  c<t.<  ...  t  <d  g  (t.)  -  max  g  (t)  .  Thus  a 

t  c ,  d  ] 

distribution  G  (such  as  G  )  maximizes  gn(t)  dG(t)  if  and  only  if  it 
is  equivalent  to 


K 

S  a  F  (•) 
k-1  K  K 


where  is  the  unit  step  at  t^  for  some  positive  numbers  such  that 


S  ok-l  .  // 


Section  Four 

Asymptotic  Normality  for  Regular  Distributions 


In  this  section  it  will  be  shown  that  if  the  ability  distribution  is 
regular  then  7r^  ,  the  maximum  likelihood  estimate  of  it  ,  is  asymptotically 
normal.  A  formula  is  derived  for  the  asymptotic  dispersion  matrix. 

Throughout  this  section  the  distribution  for  6  is  assumed  to  be 
regular.  Throughout  this  section  let  p  be  a  fixed  positive  number  chosen 
so  that  the  intersection  of  the  open  ball  with  radius  p  centered  at  it 

[it'  in  :  1 7r '  - |  <  p) 

and  the  hyperplane 

{*'  in  EJ+1  :  it'Ap  m  1} 

12  J 

is  a  subset  of  A  .  As  in  Section  III.l,  let  z  ,  z  ,  ...  z  be  any 

12  J 

orthonormal  basis  for  the  annihilator  of  p  and  let  Z  =  [z  ,  z  ,  ...  z  } 

T 

be  the  (J+l)xJ  matrix  formed  from  the  z's  so  that  it'  -*  ZZ  it'  is  the 
orthogonal  projection  onto  the  annihilator  of  p  . 

To  describe  the  asymptotic  behavior  of  we  will  need  the 

information  matrix,  i.e.  the  matrix  I  of  expected  second  derivatives  with 


typical  entry  I„ 


hj  ■  -E  3^  ^  _ 

J  1  J  1 7T '  *=7T 

-  - s  1°«  *'•«“*>  |  , 

U  1  J  1  7T '  =7T 

*  * 

PAu  )p  (u  ) 

*  J 

-  S  ,  P  (u  )  - : — - 

u  "  [w/?(u  )P 

Thus  I  can  be  written  in  the  form  E  w(u’f)  /?(u1f)/?(uV)T  and  II.  1  can  be 

u 

used  to  show  it  is  non-singular.  Since  the  columns  of  Z  are  independent, 


II. 1  also  implies  that  for  any  positive  weights 


w  =  w(u  ) 
v  1/ 


the  matrix 
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“  X  ^  T  ^  X 

2  w  Z  ,3(u  )  f}  (u  )Z  is  non-singular.  In  particular,  ZiZ  is  non- 

i/-l 

singular. 

Since  almost  surely  converges  to  it  ,  with  probability  one 

T 

will  eventually  be  within  p  of  it  .  For  -  Z  (tt^  -  it)  the  mle  can 
be  written 


A  m  A 

*N  "  *  +  ZsN  * 

Since  -  n\  -  |s^|  ,  almost  surely  |s^|  <  p  for  sufficiently  large 

N  .  Defining  on  open  {  s  in  :  |s|  <  p  )  by  MN(S)  “  L^[7r  +  ^S1 

it  follows  that  maximizes  and  consequently  must  satisfy  the 

equations 


0 


j  -  1,2,  ...  J 


In  fact,  sN  almost  surely  eventually  is  the  only  solution  of  the  equations 
with  length  less  than  p  because  with  probability  one,  for  sufficiently 
large  N  J+l  patterns  with  linearly  independent  p's  will  be  sampled, 
and  this  implies  that  the  Hessian  matrix  evaluated  at  |s|  <  p 


3  Mn(s) 


N 

T  k  T  k  9  k 

-Z  [  2  ^(ua)/3T(ua)  f*  Zs(ua)  ]Z 

a=l 


is  definite  . 

The  asymptotic  distribution  of  7r^  is  obtained  with  Taylor's  formula 
applied  to  the  gradient  of  . 


IV. 2  (Asymptotic  distribution  of  the  maximum  likelihood  estimate).  Let 
z\  ...  z^  be  any  orthonormal  basis  for  Nul(^)  and  Z  = 

[z\  ...  z^]  .  If  the  ability  distribution  is  regular  then 
converges  in  distribution  to 

it  +  N*  1//2Zn 
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where  n  is  multinormal  with  zero  mean  and  covariance  matrix 
T  -1 

(z4z)  . 

T  A  A 

Proof:  Since  s..  «■  Z  (jt„  -  tt)  implies  that  Zs..  =  n..  -  ir  .  the  theorem  can 
-  N  N  N  N 

1/2 

be  proven  by  showing  that  N  '  converges  in  distribution  to  multinormal 

n  .  This  will  be  done  by  showing  that  for  any  non-zero  J -vector  t  ,  the 
1/2 

random  variable  N  '  t»s^  is  asymptotically  normal  with  mean  zero  and 
T  T  -1 

variance  t  (Z  IZ)  t  . 

Each  component  of  the  gradient  of 

sir  Vs)  -  VWa>!'x  K>-‘k  ^  - 1,  ...  j 

k  a-1 


is  defined  and  has  continuous  partial  derivatives  of  order  two  for  |s|  <  p. 
Thus  when  | |  is  less  than  p  there  will  be  some  ^  j{<^  > 

such  that  for  each  k<J 

3  3  3 


0  - 


3s  MN(0)  +  2  3s.  3s  V^N.i  +  2  2  [S  3s. 3s.  3s  MN(£N,kSN)sN, j ]SN, i 

k  ilk  i  J  l  J  k  >J> 

—  ^(0)  +  S[  ds  d  Mn(0)  +  ^  2  ds  ds  d  MN(£N,kSN)sN,j  ]sN,i  ' 
k  ilk  jijk 


In  matrix  notation 


0  =  3Mn(0)  +  [3^(0)  +  \  CN)SN 


and 


I-  5SV°) 


nc] 

N  2  N 


S  3Mn  <°> 


The  right  hand  side  of  the  last  equation,  being  a  mean  of  independent, 

identically  distributed  random  vectors  with  zero  expectation  and  covariance 
T 

matrix  Z  IZ  ,  is  asymptotically  normal  with  expectation  0  and  covariance 
T  -1/2 

matrix  Z  IZ/N  .  Thus  N  3M^  (0)  is  asymptotically  normal  with  mean 

T 

zero  and  covariance  matrix  Z  IZ  . 


Since  the  summands  in 
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N 

io  1  ★  -  9  T 

-  Na  MN(0)  -g  {  2  IW1  z  «V*ua>  z> 

a-1 

1  2 

are  independent  and  identically  distributed,  -  -  d  M^(0)  converges  almost 
surely  to  the  non-singular  matrix  of  expected  values 


E[Pji-(u)  ~2ZT/?(u)/3(u)TZ] 


2n 

*•  ju  jL  0  T  JL  rp 

-  2  P  (u  )[P  (u  )]'V/3(u  )/3(u  TZ 

,  7T  S'  7T  1/  ^  1/  r  V 

1/-1 

T 

-  Z  IZ  . 


The  kth  row  and  ith  column  of  C„  is 

N 


U 

2  3s.as.as.  MN^£N,kSN^  SN,j 
J  i  J  k  ,J 


a-l 


kZlN<"a>  1  (K> ' •*k)  («“*)' ■*'}  («"*>■ -ZSn] 


Since  converges  almost  surely  to  zero  and  the  probabilities  are  bounded 

away  from  zero,  the  matrix  N  converges  almost  surely  to  a  matrix  of 

zeros.  Consequently  the  matrix  DN 

dh-  '•  l«2v°>  -Hv 

T 

converges  almost  surely  to  non-singular  Z  IZ  .  Thus  with  robability  one 
is  eventually  non- singular  and  eventually  both 

*h  ‘  VVbyoH 

and 


Let  Y, 


T  -1 
tDN 

0T 

-1/2 


..1/2  T  A 
N  '  t  s, 


tTDtJ1[N_1/2aMN(0)  ] 


if  is  non- singular 
otherwise 


and  X^  =  N  '  3M^(0)  so  that  Y^X^  almost  surely  eventually  equals 

1/2  T 

N  '  t*s^  .  If  it  can  be  shown  that  Y^X^  converges  in  distribution  to 

T  T  -1  T 

t  (Z  IZ)  X  where  X  is  multivariate  normal  with  covariance  matrix  Z  IZ 

T  1/2  T 

then  it  follows  that  Y^X^  and  N  '  t  are  asymptotically  normal  with 
variance  tT(ZTIZ) '1(ZTIZ) [tT(ZTIZ) "1]T  -  tT(ZTIZ)'1t  ,  and  the  proof  will 
be  complete.  Y^-tT(ZTIZ) _1  =^^=>  0T  so  Y^-tT(ZTIZ)  ‘l  =£=>  0T  .  Since 
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X^  converges  in  distribution,  [Y^-  t^(Z^IZ)  ^JX^  =^=>0  and  =E=>  o  . 
Finally,  since  tT(ZTIZ) ‘ ^  — — >  tT(ZTIZ)_1X  , 

T  T  T  T  -1  T  T  -1 

Vn  “  tYN-fc  (Z  IZ)  +  t^z'lZ)  \ 


d 


T  T  -1 

■>  t  (Z  IZ)  X  .  // 
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